Problem Description

The nyc-weather-13.csv file available from http://bit.ly/nyc-weather-13 contains hourly meteorological data from 2013 for each of the three New York City airports:

  • EWR - Newark Liberty International Airport
  • JFK - John F. Kennedy International Airport
  • LGA - LaGuardia Airport

Here we will fix the data by replacing that invalid temperature reading in May in JFK with the mean value of the two hourly temperature recordings around that temperature. Aside from adding in the code to replace the value, all of the remainder of the R Markdown document remains the same. Note the use of the inline R text which produces text in the Discussion if the outlier is there, but removes it if it is not.


Get the data

We will use the read_csv function in the readr package available in the meta-package called tidyverse to read in this data from the web. We give this data frame the name weather.

#install.packages("tidyverse")
#install.packages("knitr")
#install.packages("rmarkdown")
#install.packages("plotly")
library(tidyverse)
library(knitr)
library(rmarkdown)
library(plotly)
weather <- read_csv("http://bit.ly/nyc-weather-13")

Fix the data

# In order to replace this one value, we need to track down where
# it is in the data.  Sorting by temperature using View(weather) and keeping
# track of the row number we find it is in row 11778
weather <- weather %>%
  mutate(temp = ifelse(month == 5 & day == 9 & hour == 2 & origin == "JFK", 
    yes = (temp[11777] + temp[11779]) / 2,
    no = temp)
  )
# Suzy Renn also provided this base R solution, which has less hard coding
bad <- which(weather$temp == (min(weather[weather$month==5 , "temp"], na.rm =TRUE)))
weather$temp[bad] <- mean(c(weather$temp[bad - 1], weather$temp[bad + 1]))

Problem 1

Produce a plot exploring the relationship between month and temp.

Solution: month and temp are both quantitative variables, so we may start by looking at a scatterplot:

ggplot(data = weather, 
  mapping = aes(x = month, y = temp)) +
  geom_point()
Warning: Removed 1 rows containing missing values (geom_point).

This gives us a rough layout of the data. To better understand the variability, a (side-by-side) boxplot is preferred.

ggplot(data = weather, 
  mapping = aes(x = month, y = temp)) +
  geom_boxplot()
Warning: Continuous x aesthetic -- did you forget aes(group=...)?
Warning: Removed 1 rows containing non-finite values (stat_boxplot).

This isn’t the plot that we want and the first warning message provides some guidance as to how to proceed with a continuous x aesthetic. We also see that one value is missing from the data with the other warning.

ggplot(data = weather, 
  mapping = aes(x = month, group = month, y = temp)) +
  geom_boxplot()
Warning: Removed 1 rows containing non-finite values (stat_boxplot).

This isn’t quite the plot we want either since the x axis is on a continuous scale but month is discrete. We also can use the warning=FALSE chunk option to omit the warning about missing values.

plot1 <- ggplot(data = weather, 
  mapping = aes(x = month, group = month, y = temp)) +
  geom_boxplot() +
  scale_x_continuous(breaks = 1:12)
plot1

Problem 2

Calculate the minimum temperature recorded for each month across all three airports.

min_month_temp <- weather %>%
  group_by(month) %>%
  summarize(min_temp = min(temp))
min_month_temp
# A tibble: 12 x 2
   month min_temp
   <int>    <dbl>
 1     1    10.94
 2     2    15.98
 3     3    26.06
 4     4    30.92
 5     5    42.98
 6     6    53.96
 7     7    64.04
 8     8       NA
 9     9    48.02
10    10    33.08
11    11    21.02
12    12    17.96

If we don’t like the “raw” output that is produced by default with a table, we can pass the data frame into the kable function in the knitr package or the paged_table function in the rmarkdown package to get nicer output:

kable(min_month_temp)
month min_temp
1 10.94
2 15.98
3 26.06
4 30.92
5 42.98
6 53.96
7 64.04
8 NA
9 48.02
10 33.08
11 21.02
12 17.96

This shows that the minimum temperature for August is missing. This is due to the fact that there is a missing temperature in the data. If you look into ?min, you can see that one of the arguments to the function is na.rm which is set to FALSE by default. We will set it to TRUE now:

min_month_temp <- weather %>%
  group_by(month) %>%
  summarize(min_temp = min(temp, na.rm = TRUE))
paged_table(min_month_temp)

Problem 3

Produce a plot showing how minimum temperature varies across the 12 months.

ggplot(data = min_month_temp,
  mapping = aes(x = month, y = min_temp)) +
  geom_point()

Problem 4

Calculate the minimum temperature recorded for each month FOR EACH OF the three airports.

min_month_temp2 <- weather %>%
  group_by(month, origin) %>%
  summarize(min_temp = min(temp, na.rm = TRUE))
paged_table(min_month_temp2)

Problem 5

Explore the multivariate relationship between month, airport, and minimum temperature via a statistical graphic.

plot5 <- ggplot(data = min_month_temp2,
  mapping = aes(x = month, y = min_temp, color = origin)) +
  geom_line() +
  geom_point()

Showing off

We can easily turn any of the plots above into interactive graphics using the plotly package and its ggplotly function. Hover over the plots!

ggplotly(plot1)
ggplotly(plot5, tooltip = c("x", "y", "color"))

Discussion: In general, we see that the winter and fall months have the most variability with the summer having the least. This makes sense about New York City having both some very cold winter days and some hot summer days with a range of values throughout.

LS0tCnRpdGxlOiAiUHJlLUJvb3RjYW1wIEhXIEFuc3dlcnMgKFdpdGggRGF0YSBGaXgpIgphdXRob3I6ICJDaGVzdGVyIElzbWF5IgpkYXRlOiAiNS8yNi8yMDE3IgpvdXRwdXQ6IAogIGh0bWxfZG9jdW1lbnQ6CiAgICB0b2M6IHRydWUKICAgIHRvY19kZXB0aDogMgogICAgdG9jX2Zsb2F0OiB0cnVlCiAgICBjb2RlX2ZvbGRpbmc6IGhpZGUKICAgIGNvZGVfZG93bmxvYWQ6IHRydWUKLS0tCgpgYGB7ciBkb2MtZGVmYXVsdHMsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChjb21tZW50PU5BKQpgYGAKCgojIyBQcm9ibGVtIERlc2NyaXB0aW9uCgpUaGUgYG55Yy13ZWF0aGVyLTEzLmNzdmAgZmlsZSBhdmFpbGFibGUgZnJvbSBbaHR0cDovL2JpdC5seS9ueWMtd2VhdGhlci0xM10oaHR0cHM6Ly9pc21heWMuZ2l0aHViLmlvL3BvUnRsYW5kLWJvb3RjYW1wMTcvbnljLXdlYXRoZXItMTMuY3N2KSBjb250YWlucyBob3VybHkgbWV0ZW9yb2xvZ2ljYWwgZGF0YSBmcm9tIDIwMTMgZm9yIGVhY2ggb2YgdGhlIHRocmVlIE5ldyBZb3JrIENpdHkgYWlycG9ydHM6CgogIC0gYEVXUmAgLSBOZXdhcmsgTGliZXJ0eSBJbnRlcm5hdGlvbmFsIEFpcnBvcnQKICAtIGBKRktgIC0gSm9obiBGLiBLZW5uZWR5IEludGVybmF0aW9uYWwgQWlycG9ydAogIC0gYExHQWAgLSBMYUd1YXJkaWEgQWlycG9ydAoKKipIZXJlIHdlIHdpbGwgZml4IHRoZSBkYXRhIGJ5IHJlcGxhY2luZyB0aGF0IGludmFsaWQgdGVtcGVyYXR1cmUgcmVhZGluZyBpbiBNYXkgaW4gSkZLIHdpdGggdGhlIG1lYW4gdmFsdWUgb2YgdGhlIHR3byBob3VybHkgdGVtcGVyYXR1cmUgcmVjb3JkaW5ncyBhcm91bmQgdGhhdCB0ZW1wZXJhdHVyZS4gIEFzaWRlIGZyb20gYWRkaW5nIGluIHRoZSBjb2RlIHRvIHJlcGxhY2UgdGhlIHZhbHVlLCBhbGwgb2YgdGhlIHJlbWFpbmRlciBvZiB0aGUgUiBNYXJrZG93biBkb2N1bWVudCByZW1haW5zIHRoZSBzYW1lLiAgTm90ZSB0aGUgdXNlIG9mIHRoZSBpbmxpbmUgUiB0ZXh0IHdoaWNoIHByb2R1Y2VzIHRleHQgaW4gdGhlIERpc2N1c3Npb24gaWYgdGhlIG91dGxpZXIgaXMgdGhlcmUsIGJ1dCByZW1vdmVzIGl0IGlmIGl0IGlzIG5vdC4qKgoKKioqCgojIyMgR2V0IHRoZSBkYXRhCgpXZSB3aWxsIHVzZSB0aGUgYHJlYWRfY3N2YCBmdW5jdGlvbiBpbiB0aGUgYHJlYWRyYCBwYWNrYWdlIGF2YWlsYWJsZSBpbiB0aGUgbWV0YS1wYWNrYWdlIGNhbGxlZCBgdGlkeXZlcnNlYCB0byByZWFkIGluIHRoaXMgZGF0YSBmcm9tIHRoZSB3ZWIuICBXZSBnaXZlIHRoaXMgZGF0YSBmcmFtZSB0aGUgbmFtZSBgd2VhdGhlcmAuCgpgYGB7ciBsb2FkLCBtZXNzYWdlPUZBTFNFfQojaW5zdGFsbC5wYWNrYWdlcygidGlkeXZlcnNlIikKI2luc3RhbGwucGFja2FnZXMoImtuaXRyIikKI2luc3RhbGwucGFja2FnZXMoInJtYXJrZG93biIpCiNpbnN0YWxsLnBhY2thZ2VzKCJwbG90bHkiKQpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeShrbml0cikKbGlicmFyeShybWFya2Rvd24pCmxpYnJhcnkocGxvdGx5KQp3ZWF0aGVyIDwtIHJlYWRfY3N2KCJodHRwOi8vYml0Lmx5L255Yy13ZWF0aGVyLTEzIikKYGBgCgojIyMgRml4IHRoZSBkYXRhCgpgYGB7cn0KIyBJbiBvcmRlciB0byByZXBsYWNlIHRoaXMgb25lIHZhbHVlLCB3ZSBuZWVkIHRvIHRyYWNrIGRvd24gd2hlcmUKIyBpdCBpcyBpbiB0aGUgZGF0YS4gIFNvcnRpbmcgYnkgdGVtcGVyYXR1cmUgdXNpbmcgVmlldyh3ZWF0aGVyKSBhbmQga2VlcGluZwojIHRyYWNrIG9mIHRoZSByb3cgbnVtYmVyIHdlIGZpbmQgaXQgaXMgaW4gcm93IDExNzc4CndlYXRoZXIgPC0gd2VhdGhlciAlPiUKICBtdXRhdGUodGVtcCA9IGlmZWxzZShtb250aCA9PSA1ICYgZGF5ID09IDkgJiBob3VyID09IDIgJiBvcmlnaW4gPT0gIkpGSyIsIAogICAgeWVzID0gKHRlbXBbMTE3NzddICsgdGVtcFsxMTc3OV0pIC8gMiwKICAgIG5vID0gdGVtcCkKICApCmBgYAoKYGBge3IgZXZhbD1GQUxTRX0KIyBTdXp5IFJlbm4gYWxzbyBwcm92aWRlZCB0aGlzIGJhc2UgUiBzb2x1dGlvbiwgd2hpY2ggaGFzIGxlc3MgaGFyZCBjb2RpbmcKYmFkIDwtIHdoaWNoKHdlYXRoZXIkdGVtcCA9PSAobWluKHdlYXRoZXJbd2VhdGhlciRtb250aD09NSAsICJ0ZW1wIl0sIG5hLnJtID1UUlVFKSkpCndlYXRoZXIkdGVtcFtiYWRdIDwtIG1lYW4oYyh3ZWF0aGVyJHRlbXBbYmFkIC0gMV0sIHdlYXRoZXIkdGVtcFtiYWQgKyAxXSkpCmBgYAoKCgojIyBQcm9ibGVtIDEKClByb2R1Y2UgYSBwbG90IGV4cGxvcmluZyB0aGUgcmVsYXRpb25zaGlwIGJldHdlZW4gYG1vbnRoYCBhbmQgYHRlbXBgLgoKKipTb2x1dGlvbioqOiAgYG1vbnRoYCBhbmQgYHRlbXBgIGFyZSBib3RoIHF1YW50aXRhdGl2ZSB2YXJpYWJsZXMsIHNvIHdlIG1heSBzdGFydCBieSBsb29raW5nIGF0IGEgc2NhdHRlcnBsb3Q6CgpgYGB7ciBzY2F0MX0KZ2dwbG90KGRhdGEgPSB3ZWF0aGVyLCAKICBtYXBwaW5nID0gYWVzKHggPSBtb250aCwgeSA9IHRlbXApKSArCiAgZ2VvbV9wb2ludCgpCmBgYAoKVGhpcyBnaXZlcyB1cyBhIHJvdWdoIGxheW91dCBvZiB0aGUgZGF0YS4gIFRvIGJldHRlciB1bmRlcnN0YW5kIHRoZSB2YXJpYWJpbGl0eSwgYSAoc2lkZS1ieS1zaWRlKSBib3hwbG90IGlzIHByZWZlcnJlZC4gIAoKYGBge3J9CmdncGxvdChkYXRhID0gd2VhdGhlciwgCiAgbWFwcGluZyA9IGFlcyh4ID0gbW9udGgsIHkgPSB0ZW1wKSkgKwogIGdlb21fYm94cGxvdCgpCmBgYAoKVGhpcyBpc24ndCB0aGUgcGxvdCB0aGF0IHdlIHdhbnQgYW5kIHRoZSBmaXJzdCB3YXJuaW5nIG1lc3NhZ2UgcHJvdmlkZXMgc29tZSBndWlkYW5jZSBhcyB0byBob3cgdG8gcHJvY2VlZCB3aXRoIGEgY29udGludW91cyBgeGAgYWVzdGhldGljLiAgV2UgYWxzbyBzZWUgdGhhdCBvbmUgdmFsdWUgaXMgbWlzc2luZyBmcm9tIHRoZSBkYXRhIHdpdGggdGhlIG90aGVyIHdhcm5pbmcuCgpgYGB7cn0KZ2dwbG90KGRhdGEgPSB3ZWF0aGVyLCAKICBtYXBwaW5nID0gYWVzKHggPSBtb250aCwgZ3JvdXAgPSBtb250aCwgeSA9IHRlbXApKSArCiAgZ2VvbV9ib3hwbG90KCkKYGBgCgpUaGlzIGlzbid0IHF1aXRlIHRoZSBwbG90IHdlIHdhbnQgZWl0aGVyIHNpbmNlIHRoZSBgeGAgYXhpcyBpcyBvbiBhIGNvbnRpbnVvdXMgc2NhbGUgYnV0IGBtb250aGAgaXMgZGlzY3JldGUuICBXZSBhbHNvIGNhbiB1c2UgdGhlIGB3YXJuaW5nPUZBTFNFYCBjaHVuayBvcHRpb24gdG8gb21pdCB0aGUgd2FybmluZyBhYm91dCBtaXNzaW5nIHZhbHVlcy4KCmBgYHtyIHdhcm5pbmc9RkFMU0V9CnBsb3QxIDwtIGdncGxvdChkYXRhID0gd2VhdGhlciwgCiAgbWFwcGluZyA9IGFlcyh4ID0gbW9udGgsIGdyb3VwID0gbW9udGgsIHkgPSB0ZW1wKSkgKwogIGdlb21fYm94cGxvdCgpICsKICBzY2FsZV94X2NvbnRpbnVvdXMoYnJlYWtzID0gMToxMikKcGxvdDEKYGBgCgoKIyMgUHJvYmxlbSAyCgpDYWxjdWxhdGUgdGhlIG1pbmltdW0gdGVtcGVyYXR1cmUgcmVjb3JkZWQgZm9yIGVhY2ggbW9udGggYWNyb3NzIGFsbCB0aHJlZSBhaXJwb3J0cy4KCmBgYHtyfQptaW5fbW9udGhfdGVtcCA8LSB3ZWF0aGVyICU+JQogIGdyb3VwX2J5KG1vbnRoKSAlPiUKICBzdW1tYXJpemUobWluX3RlbXAgPSBtaW4odGVtcCkpCm1pbl9tb250aF90ZW1wCmBgYAoKSWYgd2UgZG9uJ3QgbGlrZSB0aGUgInJhdyIgb3V0cHV0IHRoYXQgaXMgcHJvZHVjZWQgYnkgZGVmYXVsdCB3aXRoIGEgdGFibGUsIHdlIGNhbiBwYXNzIHRoZSBkYXRhIGZyYW1lIGludG8gdGhlIGBrYWJsZWAgZnVuY3Rpb24gaW4gdGhlIGBrbml0cmAgcGFja2FnZSBvciB0aGUgYHBhZ2VkX3RhYmxlYCBmdW5jdGlvbiBpbiB0aGUgYHJtYXJrZG93bmAgcGFja2FnZSB0byBnZXQgbmljZXIgb3V0cHV0OgoKYGBge3J9CmthYmxlKG1pbl9tb250aF90ZW1wKQpgYGAKCgpUaGlzIHNob3dzIHRoYXQgdGhlIG1pbmltdW0gdGVtcGVyYXR1cmUgZm9yIEF1Z3VzdCBpcyBtaXNzaW5nLiAgVGhpcyBpcyBkdWUgdG8gdGhlIGZhY3QgdGhhdCB0aGVyZSBpcyBhIG1pc3NpbmcgdGVtcGVyYXR1cmUgaW4gdGhlIGRhdGEuICBJZiB5b3UgbG9vayBpbnRvIGA/bWluYCwgeW91IGNhbiBzZWUgdGhhdCBvbmUgb2YgdGhlIGFyZ3VtZW50cyB0byB0aGUgZnVuY3Rpb24gaXMgYG5hLnJtYCB3aGljaCBpcyBzZXQgdG8gYEZBTFNFYCBieSBkZWZhdWx0LiAgV2Ugd2lsbCBzZXQgaXQgdG8gYFRSVUVgIG5vdzoKCmBgYHtyfQptaW5fbW9udGhfdGVtcCA8LSB3ZWF0aGVyICU+JQogIGdyb3VwX2J5KG1vbnRoKSAlPiUKICBzdW1tYXJpemUobWluX3RlbXAgPSBtaW4odGVtcCwgbmEucm0gPSBUUlVFKSkKcGFnZWRfdGFibGUobWluX21vbnRoX3RlbXApCmBgYAoKCiMjIFByb2JsZW0gMwoKUHJvZHVjZSBhIHBsb3Qgc2hvd2luZyBob3cgbWluaW11bSB0ZW1wZXJhdHVyZSB2YXJpZXMgYWNyb3NzIHRoZSAxMiBtb250aHMuCgpgYGB7cn0KZ2dwbG90KGRhdGEgPSBtaW5fbW9udGhfdGVtcCwKICBtYXBwaW5nID0gYWVzKHggPSBtb250aCwgeSA9IG1pbl90ZW1wKSkgKwogIGdlb21fcG9pbnQoKQpgYGAKCiMjIFByb2JsZW0gNAoKQ2FsY3VsYXRlIHRoZSBtaW5pbXVtIHRlbXBlcmF0dXJlIHJlY29yZGVkIGZvciBlYWNoIG1vbnRoIEZPUiBFQUNIIE9GIHRoZSB0aHJlZSBhaXJwb3J0cy4KCmBgYHtyfQptaW5fbW9udGhfdGVtcDIgPC0gd2VhdGhlciAlPiUKICBncm91cF9ieShtb250aCwgb3JpZ2luKSAlPiUKICBzdW1tYXJpemUobWluX3RlbXAgPSBtaW4odGVtcCwgbmEucm0gPSBUUlVFKSkKcGFnZWRfdGFibGUobWluX21vbnRoX3RlbXAyKQpgYGAKCiMjIFByb2JsZW0gNQoKRXhwbG9yZSB0aGUgbXVsdGl2YXJpYXRlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIG1vbnRoLCBhaXJwb3J0LCBhbmQgbWluaW11bSB0ZW1wZXJhdHVyZSB2aWEgYSBzdGF0aXN0aWNhbCBncmFwaGljLgoKYGBge3J9CnBsb3Q1IDwtIGdncGxvdChkYXRhID0gbWluX21vbnRoX3RlbXAyLAogIG1hcHBpbmcgPSBhZXMoeCA9IG1vbnRoLCB5ID0gbWluX3RlbXAsIGNvbG9yID0gb3JpZ2luKSkgKwogIGdlb21fbGluZSgpICsKICBnZW9tX3BvaW50KCkKYGBgCgojIyBTaG93aW5nIG9mZgoKV2UgY2FuIGVhc2lseSB0dXJuIGFueSBvZiB0aGUgcGxvdHMgYWJvdmUgaW50byBpbnRlcmFjdGl2ZSBncmFwaGljcyB1c2luZyB0aGUgYHBsb3RseWAgcGFja2FnZSBhbmQgaXRzIGBnZ3Bsb3RseWAgZnVuY3Rpb24uICBIb3ZlciBvdmVyIHRoZSBwbG90cyEKCmBgYHtyIHdhcm5pbmc9RkFMU0UsIGZpZy5oZWlnaHQ9NX0KZ2dwbG90bHkocGxvdDEpCmdncGxvdGx5KHBsb3Q1LCB0b29sdGlwID0gYygieCIsICJ5IiwgImNvbG9yIikpCmBgYAoKCioqRGlzY3Vzc2lvbioqOiAgSW4gZ2VuZXJhbCwgd2Ugc2VlIHRoYXQgdGhlIHdpbnRlciBhbmQgZmFsbCBtb250aHMgaGF2ZSB0aGUgbW9zdCB2YXJpYWJpbGl0eSB3aXRoIHRoZSBzdW1tZXIgaGF2aW5nIHRoZSBsZWFzdC4gIFRoaXMgbWFrZXMgc2Vuc2UgYWJvdXQgTmV3IFlvcmsgQ2l0eSBoYXZpbmcgYm90aCBzb21lIHZlcnkgY29sZCB3aW50ZXIgZGF5cyBhbmQgc29tZSBob3Qgc3VtbWVyIGRheXMgd2l0aCBhIHJhbmdlIG9mIHZhbHVlcyB0aHJvdWdob3V0LiBgciBpZmVsc2UobWluX21vbnRoX3RlbXAkbWluX3RlbXBbNV0gPCA0MCwgcGFzdGUwKCJBcyBub3RlZCBpbiB0aGUgYm94cGxvdCBpbiBQcm9ibGVtIDEsIHRoZXJlIGlzIGEgc3RyYW5nZSBvdXRsaWVyIGluIE1heSBzaG93aW5nIGEgbWluaW11bSB0ZW1wZXJhdHVyZSBvZiAiLCBtaW5fbW9udGhfdGVtcCRtaW5fdGVtcFs1XSwgIi4iKSwgIiIpYCAgCg==